-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an eval-only script for existing ckpts #736
base: main
Are you sure you want to change the base?
Conversation
# - label: all-small-ppl-validation | ||
# data: | ||
# num_workers: 0 | ||
# drop_last: true | ||
# # generate_doc_lengths: true | ||
# memmap_dtype: uint32 | ||
# datasets: | ||
# c4_en-validation: | ||
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/c4_en/val/part-0-00000.npy | ||
# dolma_books-validation: | ||
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_books/val/part-0-00000.npy | ||
# dolma_common-crawl-validation: | ||
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_common-crawl/val/part-0-00000.npy | ||
# dolma_pes2o-validation: | ||
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_pes2o/val/part-0-00000.npy | ||
# dolma_reddit-validation: | ||
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_reddit/val/part-0-00000.npy | ||
# dolma_stack-validation: | ||
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_stack/val/part-0-00000.npy | ||
# dolma_wiki-validation: | ||
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_wiki/val/part-0-00000.npy | ||
# ice-validation: | ||
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/ice/val/part-0-00000.npy | ||
# m2d2_s2orc-validation: | ||
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/m2d2_s2orc/val/part-0-00000.npy | ||
# pile-validation: | ||
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/pile/val/part-0-00000.npy | ||
# wikitext_103-validation: | ||
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/wikitext_103/val/part-0-00000.npy | ||
- label: all-small-ppl-validation | ||
data: | ||
num_workers: 0 | ||
drop_last: true | ||
# generate_doc_lengths: true | ||
memmap_dtype: uint32 | ||
datasets: | ||
c4_en-validation: | ||
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/c4_en/val/part-0-00000.npy | ||
dolma_books-validation: | ||
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_books/val/part-0-00000.npy | ||
dolma_common-crawl-validation: | ||
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_common-crawl/val/part-0-00000.npy | ||
dolma_pes2o-validation: | ||
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_pes2o/val/part-0-00000.npy | ||
dolma_reddit-validation: | ||
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_reddit/val/part-0-00000.npy | ||
dolma_stack-validation: | ||
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_stack/val/part-0-00000.npy | ||
dolma_wiki-validation: | ||
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_wiki/val/part-0-00000.npy | ||
ice-validation: | ||
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/ice/val/part-0-00000.npy | ||
m2d2_s2orc-validation: | ||
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/m2d2_s2orc/val/part-0-00000.npy | ||
pile-validation: | ||
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/pile/val/part-0-00000.npy | ||
wikitext_103-validation: | ||
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/wikitext_103/val/part-0-00000.npy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to commit this change?
- label: copa_rc_0shot | ||
type: downstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care about any of the 0-shots?
if wandb.run is not None: | ||
wandb.finish(exit_code=exit_code, quiet=True) | ||
# if wandb.run is not None: | ||
# wandb.finish(exit_code=exit_code, quiet=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug code?
# train_loader = build_train_dataloader(cfg) | ||
train_loader = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always going to be None
? If so, we don't need it.
if 'step' in cfg.load_path.split('/')[-1]: | ||
load_paths = [cfg.load_path] | ||
else: | ||
# This globbing does not work with remote paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is that problem handled then?
log.info(f"Number of non-embedding parameters: {olmo_model.num_params(include_embedding=False):,d}") | ||
log.info(f"Peak GPU Memory (MB) before {cfg.distributed_strategy}: {int(peak_gpu_memory() or 0)}") | ||
|
||
olmo_model.set_activation_checkpointing(cfg.activation_checkpointing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only ever eval, we don't need this.
optim = build_optimizer(cfg, dist_model) | ||
scheduler = build_scheduler(cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need optimizers and schedulers if we're just evaluating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're creating these only so that you can produce a Trainer
object?
How hard is it to pull the stuff you need out of the Trainer
object, so we don't have to do so many things we don't need? It makes me particularly uncomfortable that you're creating a trainer with a None
data loader, which isn't supposed to work. It just happens to work.
This PR adds
scripts/eval.py
, which evaluates one or more existing ckpts while bypassing the training steps.It seems impossible to backfill evals back to the original wandb run, because "step" must always increase. Rewinding the run will truncate the log, which we don't want. Therefore, this script logs things to a new wandb run.
Starting from a training setup:
XXX.sh
file intoXXX-eval.sh
, point toscripts/eval.sh
, add a flag--wandb.group=XXX
to ensure it logs to the same group, and specify--load_path
to be either a single ckpt or all ckpts under a directory.XXX-launch.sh
file intoXXX-eval-launch.sh
, change--task-name
toXXX-eval
, and change the command so it runsXXX-eval.sh
.See an example in
peteish1-eval.sh
andpeteish1-eval-launch.sh
.